perm filename TEXSEM.SAV[TEX,DEK] blob sn#427773 filedate 1979-03-23 generic text, type T, neo UTF8
comment This was page 13 of TEXSEM before I made a major change on March 23 1979;

comment The paragraph builder: hangwidth,hangbegin,justification,finishparagraph;

comment When a paragraph is initiated, the semantics routine begins to form
an hlist in hmode. Indentation at the beginning of the paragraph, if any,
is inserted explicitly into the list, as a box that has width but no
height or depth or associated list. Hanging indentation is controlled
by three global variables, "hanglength" and "hangbegin" and "hangfirst";

internal real hangwidth # amount to indent lines (negative if at right margin);
internal integer hangbegin # number of lines to wait before indentation changes;
internal boolean hangfirst # does hanging indent occur before hangbegin or not;
integer inhangbegin; boolean inhangfirst; real inhangwidth # for boxed paragraphs;
integer lines # number of lines output by justification procedure;
real lastwidth # width of final line output by justification procedure;
real justpar # same as jpar but type real;

comment The major computation associated with paragraph building is the
breaking of lines done by procedure "justification", which has the following
parameters:
	real initwidth		line width
	integer linechange	specifies k after which hanging indentation changes
	real hangwidth		if hangfirst, indentation on lines numbered ≤k
				otherwise,   indentation on lines numbered >k
	boolean penlt		true if final-widow penalty to be applied
(Note that if hangwidth is 0, the values of hangfirst and linechange don't matter.)

Procedure "justification" takes the hlist headed by temphead and appends it to
the current list. This procedure also computes the following two quantities:
	integer lines		the number of lines output
	real lastwidth		the width of the final line, including possible
					hanging left indentation
(The displayed formula routine uses "lastwidth" to decide whether or not to
omit a line of blank space.)

The justification routine reads through the entire paragraph before deciding
the best places to break, since it is often better to make a slightly bad
break at the beginning of a paragraph to make things better later on. In
general, the algorithm essentially minimizes the sum of the squares of the
individual badness ratings (plus 1000 if there's a hyphenation on the penultimate
line) subject to the condition that no line has badness exceeding 200.* To
do this it maintains a list of "break nodes" containing the following fields:
	integer lineno		number of lines so far, up to this break
	real width,gluestretch,glueshrink	accumulated totals so far
	integer curbrk		points to place in hlist after which break occurs
	real totbad		accumulated sum of badness squares (times .0001)
	integer prevbrk		points to breaknode for best preceding break
	real target		width at which the next line would break ideally
Furthermore the first word of each break record points to the previous break
record, sorted in order of the target fields. The value in totbad is the
minimum possible given a break at the stated position, and the link in prevbrk
tells how to achieve this minimum. Two pointers are maintained to the breaknode
list, namely topopen and botopen. The most recent node (the one with highest
target value) is pointed to by topopen, and by following the links from this
node to the one pointed to by botopen (inclusive) we obtain all breaks whose
targets are in the range that leads to finite badness. In typical circumstances,
there will be comparatively few break nodes between topopen and botopen, so
comparatively little computation will be needed to rate the desirability of
a permissible break. Permissible breaks occur at glue nodes immediately following
anything but a rule, glue, penalty, eject, or "whatsit" node, provided that
such breaks have not been suppressed by hyphenation control nodes, and we can
also break at discretionary hyphens, penalty nodes, or eject nodes. In fact,
an eject node is forced to be a break, no matter how bad the result.

Here's a sort of example of breaking:
	abc defgh ijk lmnop qrst uvwx
	yzab cdefghi jklmn opqr stu
Suppose the line width is such that all breaks up to qrst have badness exceeding
200, but to break after qrst or uvwx is feasible. Then two break nodes will
be created, one pointing to the glue following qrst and one pointing to the
glue following uvwx. (The pointer in curbrk is to a glue node when breaking
"before" the glue, but to a discretionary hyphen or penalty or eject node
otherwise. Suppose we have the sequence
	a, b, \-, <glue>, c,
then to point at the discretionary hyphen (/-) means to break and to
include a hyphen there, but to point at the glue node after the \-
means to break after ab with no hyphen.) In the break node for qrst, the total
width,gluestretch, and glueshrink are computed up to the beginning of the
following line, not simply up to the point of break. Thus, if several consecutive
glue nodes will be eliminated by this break, their space and variability are
included in the totals, since these totals are used to compute the badness
of the following line (by subtracting the totals up to another attempted break).
Both break nodes for qrst and uvwx will have prevbrk pointing back to the
initial break node (which stands at the very beginning of the paragraph), and
topopen will point to the uvwx one. botopen will move from the initial break
node to the qrst node when the next break yzab is tried, since there will be
no further chances to use it as the prevbrk. The target value of qrst might
make it feasible to break after jklmn, so it appears that the number of open
breaks will grow as there as more and more lines. However, long words will
tend to cut down the number of possibilities. A word is passed to the
hyphenation procedure for insertion of discretionary hyphens only if it
has five or more letters and straddles the target of some open break node.

This routine was developed jointly by D. E. Knuth and M . F. Plass.

*Actually "200" in this discussion is short for 100*jpar, where jpar is
normally 2 but can be changed by the \chpar command;

internaldef prevbrk(p)=⊂mem[p+4]⊃ # break node for best previous break leading here;
internaldef lineno(p)=⊂mem[p+5]⊃ # number of lines up to this break;
internaldef curbrk(p)=⊂mem[p+6]⊃ # hlist position of this break;
internaldef target(p)=⊂memreal(p+7)⊃ # best curwd for the next break after here;
internaldef totbad(p)=⊂memreal(p+8)⊃ # best sum of badness↑2 up to here;
internaldef breaknodesize=9 # number of words in a break node;

comment The following global variables are not currently used by other routines
in TEX, but they are declared global anyway in case some extension wants to
refer to them;

internal boolean autobreaking # automatic line breaking not shut off by hyphnode;
internal real curwd,curst,cursh # current total width, stretch, and shrink;
internal integer topopen,botopen # boundary of the active break nodes;

internal procedure justification(real initwidth; integer linechange;
	boolean hangfirst;
	real hangwidth; boolean penlt) # routine to break hlists almost optimally;
begin real secondwidth # initwidth - |hangwidth|;
integer q,p,prevp,n,t,ll,r;
integer bestplace; real bestscore,correction;
boolean notwarned # warning message has not been issued;

simp procedure trybreak(real penalt,w) # decides if p might be a
	reasonable place to break, and updates the break node table;
begin comment This procedure is called when p points to a permissible break
in an hlist, corresponding to an additional penalty as specified.
(The value of penalt is .01 times the value in TEX's writeup,
since there's really no need to multiply by 100 when computing badness scores.)
Parameter w is the current width to be used in badness calculations--it
differs from curwd if an inserted hyphen would appear at the break.
If this is a potentially useful break, a new break node is created. The
value of botopen is adjusted to "close" break nodes that are no longer relevant;
integer r,prevr; real t,glue,badness;
r←topopen; prevr←0; bestplace←0; bestscore←10.0↑30;
while true do
	begin comment Look at all open break nodes to see if there are any which
	lead to a badness≤200 at the current position, and close break nodes
	using the fact that target(mem[r])≤target(r);
	label nogood;
	t←target(r);
	if w>t then
		begin glue←cursh-glueshrink(r);
		if glue≤0.0001 then glue←.0001;
		if w>t+glue then
			if prevr then
				begin botopen←prevr; done;
				end
			else	begin comment A bad break was inevitable;
				bestplace←botopen←r;
				bestscore←0.0 # any value is ok here;
				done;
				end
		else badness←((w-t)/glue)↑3;
		end
	else	begin glue←curst-gluestretch(r);
		if gluestretch(r)>1000000.0 and notwarned then
			begin comment floating-point can't handle this;
			error("Too much stretch for proper line breaking");
			notwarned←false;
			end;
		if glue≤0.0001 then glue←.0001;
		badness←((t-w)/glue)↑3;
		end;
	if badness>justpar then
		begin if penalt>-10.0↑30 then go to nogood;
		if badness>10.0↑19 then badness←10.0↑19 # prevent overflow;
		end;
	if penalt≥0.0 then badness←(badness+penalt+.01)↑2+totbad(r)
	else if penalt>-10.0↑30 then badness←(badness+.01)↑2-penalt↑2+totbad(r)
	else	begin comment this case is for "eject";
		badness←(badness+.01)↑2+totbad(r);
		if bestplace=0 and badness>bestscore then badness←bestscore;
		end;
	if type(curbrk(r))=discnode and type(p)=discnode then
		badness←badness+penpen/10000 # additional penalty for two
			hyphens in a row;
	if badness≤bestscore then
		begin bestplace←r; bestscore←badness;
		end;
	nogood: if r=botopen then done;
	prevr←r; r←mem[r];
	end;
if bestplace then
	begin comment A new break was found and bestplace is the best lead-in;
	integer q,t; q←getnode(breaknodesize);
	lineno(q)←lineno(bestplace)+1;
	prevbrk(q)←bestplace;
	totbad(q)←bestscore;
	curbrk(q)←r←p;
	width(q)←curwd;glueshrink(q)←cursh;gluestretch(q)←curst;
	while r do
		begin case type(r) of begin
		[gluenode] begin t←value(r);
		width(q)←width(q)+gluespace(t);
		gluestretch(q)←gluestretch(q)+gluestretch(t);
		glueshrink(q)←glueshrink(q)+glueshrink(t); end;
		[penaltynode][discnode];
		[kernnode] width(q)←width(q)+gluespace(r);
		[ejectnode] if r≠p then done;
		else done
		  end;
		r←link(r);
		end;
	if lineno(q)<linechange then target(q)←width(q)+initwidth
		else target(q)←width(q)+secondwidth;
	if target(q)≥target(topopen) then
		begin mem[q]←topopen; topopen←q;
		end
	else	begin comment This case can arise only in weird circumstances
		due to changing line lengths or boxes of negative width;
		integer p;
		p←topopen;
		while p≠botopen and target(mem[p])>target(q) do p←mem[p];
		mem[q]←mem[p]; mem[p]←q;
		if p=botopen then botopen←q;
		end;
	end;
end;

comment The justification procedure begins here;
lines←0; lastwidth←1000.0; notwarned←true;
p←mem[temphead]; if p=0 then return;
autobreaking←true;
secondwidth←initwidth-abs(hangwidth);
if hangfirst then initwidth↔secondwidth;
curwd←curst←cursh←0.0;
prevp←p # prevp doesn't have to point to the node before p, it should only
	point to a node such that if node p is a glue node the test below
	for breaking at p is correct;
topopen←botopen←getnode(breaknodesize);
curbrk(topopen)←temphead;
target(topopen)←if linechange>0 then initwidth else secondwidth;

while p do
	begin comment We go through the hlist calling trybreak at each
	permissible break;
	case type(p) of begin
	[charnode] begin integer c; c←info(p);
	curwd←curwd+charwd((c lsh -7),fontinfo[c]); end;
	[hlistnode][vlistnode][rulenode] curwd←curwd+width(p);
	[whatsitnode] justext(p);
	[gluenode] begin if autobreaking then
		begin case type(prevp) of begin
		[charnode][hlistnode][vlistnode][hyphnode][discnode][insnode]
		trybreak(0.0,curwd);
		else
		  end;
		end;
	t←value(p);
	curwd←curwd+gluespace(t);
	curst←curst+gluestretch(t);
	cursh←cursh+glueshrink(t);
	if autobreaking then
		begin comment Check to see if possible hyphenation should be
		tested, namely if the next nodes are lower case letters of the
		same font, straddling the target of some open break, and
		followed either by glue or by "." or "," of the same font;
		integer q;
		if type(q←link(p))=charnode then
			begin integer t,f,fa,fz,n; real s;
			label nohyph # go here if hyphenation not to be tried;
			t←ufield(info,mem[q]);
			fa←t land('37 lsh(infod+7))+("a" lsh infod) # "a" in font;
			fz←fa+(26 lsh infod) # just after "z" in font;
			n←0 # n will be the number of letters passed;
			s←curwd # s will be width including lookahead;
			comment Admissible nodes are kern nodes or lie in the
				interval [fa,fz) including their link field;
			while q do
				begin if mem[q]≥fz then
					begin if type(q)≠kernnode then done;
					s←s+gluespace(q) # adjust current width;
					end
				else	begin if mem[q]<fa then done;
					comment lowercase letter found;
					t←info(q);
					s←s+charwd((t lsh -7),fontinfo[t]);
					n←n+1;
					end;
				q←link(q);
				end;
			if n<5 or q=0 then go to nohyph;
			case type(q) of begin
			[charnode] if sftable[(info(q))land '177] = 1.0
			then go to nohyph;
			[gluenode];
			else go to nohyph
			  end;
			r←topopen;
			while s≤target(r) do
				begin if r=botopen then go to nohyph;
				r←mem[r];
				end;
			if curwd≥target(r) then go to nohyph;
			hyphenate(p←link(p),n,
				fa+((("-"-"a")lsh infod)+(discnode lsh typed)));
			continue # resume the "while p" loop;
			nohyph: curwd←s;
			end;
		prevp←link(p); p←q; continue;
		end;
	end # end of the [gluenode] case;
	[leadernode][insnode];
	[kernnode] curwd←curwd+gluespace(p);
	[hyphnode] autobreaking←ufield(value,mem[p]);
	[penaltynode] begin short integer n; n←penalty(p);
	if n<(1000 min infpen) then trybreak(n/100.0,curwd) end;
	[discnode] begin t←value(p);
	comment Hyphenation penalty is hpen, and we must consider its width;
	trybreak(hpen/100,curwd+charwd((t lsh -7),fontinfo[t])) end;
	[ejectnode] begin trybreak(-10.0↑30,curwd) # this opens a new
	break for sure; botopen←topopen end # and we also close all the others;
	else confusion
	  end;
	prevp←p; p←link(p);
	end;

comment Now find the best break that's currently open;
q←topopen; bestscore←10.0↑30; bestplace←topopen;
while true do
	begin if type(curbrk(q))=discnode then
		totbad(q)←totbad(q)+penpen/10000 # additional badness
		for hyphenating the penultimate line;
	if totbad(q)<bestscore then
		begin bestplace←q; bestscore←totbad(q);
		end;
	if q=botopen then done;
	q←mem[q];
	end;
ll←lineno(bestplace);
if link(curbrk(bestplace)) then
	begin comment the last break does not remove the fillglue at end;
	ll←ll+1; correction←gluestretch(fillglue);
	end
else correction←0.0;

comment Now we have found the best sequence of breaks, namely to break into ll
lines, the last break coming at "bestplace" and the break nodes telling where
to break after that. The next step is to link up the break nodes in forward
order, using their (now redundant) lineno fields for this purpose;
define nextbrk(q)=⊂lineno(q)⊃;
q←0;
while bestplace do
	begin nextbrk(bestplace)←q;
	q←bestplace; bestplace←prevbrk(q);
	end;


comment During the finishing-up phase, which breaks up the hlist into packages
that go into the output vlist, q points to the break node specifying the end
of the current line (or 0 if the current line is the last), and temphead heads
the hlist of nodes remaining to be output;
q←nextbrk(q); lines←0;
while mem[temphead] do
	begin boolean brokenline # does this line end with an inserted hyphen;
	brokenline←false;
	if q then
		begin comment break the hlist;
		r←curbrk(q);
		case type(r) of begin
		[gluenode] begin delgluelink(value(r));
		setfield(value,mem[r],zeroglue);end # glue break becomes zero glue;
		[discnode] begin brokenline←true;
		mem[r]←mem[r]-((discnode-charnode)lsh typed)# change to charnode;
		end;
		else comment ignore all other node types;
		  end;
		t←link(r); setlink(r,0);
		comment Now prune unwanted nodes at the break;
		while t do
			begin integer tt; case tt←type(t) of begin
			[gluenode] delgluelink(value(t));
			[kernnode][penaltynode][discnode];
			else done
			  end;
			r←link(t);
			if tt=kernnode then freenode(t,kernnodesize)
			else freeavail(t);
			t←r;
			end;
		end;
	r←hpackage(temphead,if (lines←lines+1)>linechange then secondwidth
		else initwidth,0);
	if lines=ll then
		begin lastwidth←width(r);
		if glueset(r)>0 then
			lastwidth←lastwidth-glueset(r)*correction;
		end;
	if ragged then glueset(r)←glueset(r)*(100.0/(100.0+ragged));
	if hangwidth>0 and
	((hangfirst and lines≤linechange)or(not hangfirst and lines>linechange))then
		 begin shiftamt(r)←hangwidth;
		if lines=ll then lastwidth←lastwidth+hangwidth;
		end;
	append(r);
	if mode>0 then
		begin comment If being used by the page builder, insert the
		topinsert, botinsert, and eject nodes removed from the line
		by hpackaging, then check if there is any special penalty for
		breaking after this line;
		integer pen;
		if mem[inserts] then
			begin mem[curnode]←mem[curnode]+mem[inserts];
			curnode←mem[inserts];
			while link(curnode) do curnode←link(curnode);
			end;
		if (lines=1 and ll>1) or (penlt and lines=ll-1) then
			pen←wpen lsh valued
		else pen←0;
		if brokenline then pen←pen+(bpen lsh valued);
		if pen then store((penaltynode lsh typed)+pen);
		end
	else dsnodelist(mem[inserts]);
	if q=0 then done;
	mem[temphead]←t;
	q←nextbrk(q);
	end;

comment Finally, free the list of break nodes;
while topopen do
	begin q←mem[topopen];
	freenode(topopen,breaknodesize);
	topopen←q;
	end;
end;

simp procedure finishparagraph(boolean penlt);
begin comment This procedure is invoked when the paragraph-so-far ends or
is followed by a displayed equation;
if type(curnode)=gluenode then
	begin comment remove space at paragraph end and replace it by hfill;
	delgluelink(value(curnode)); mem[curnode]←fillgluespec;
	end
else store(fillgluespec) # append hfill to paragraph-so-far;
mem[temphead]←mem[head] # get ready for justification;
popnest # return to vmode of the page builder;
justification(pagemem[hsizemem],hangbegin,hangfirst,hangwidth,penlt) # append 
	justified paragraph to the page contribution list;
end;